-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Avoid reading unusually large parquet pages #27303
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid reading unusually large parquet pages #27303
Conversation
d85e703 to
4544755
Compare
|
Please remove feat: from commit message. It is not convention we use |
Ok. |
4544755 to
04989ed
Compare
sopel39
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm % @raunaqmorarka PTAL too
lib/trino-parquet/src/main/java/io/trino/parquet/reader/PageReader.java
Outdated
Show resolved
Hide resolved
|
one comment |
5fc07bc to
51b8cdb
Compare
plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/DeltaLakeSessionProperties.java
Outdated
Show resolved
Hide resolved
lib/trino-parquet/src/test/java/io/trino/parquet/reader/TestPageReader.java
Outdated
Show resolved
Hide resolved
plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/DeltaLakeSessionProperties.java
Outdated
Show resolved
Hide resolved
lib/trino-parquet/src/main/java/io/trino/parquet/ParquetReaderOptions.java
Outdated
Show resolved
Hide resolved
lib/trino-parquet/src/main/java/io/trino/parquet/reader/ParquetColumnChunkIterator.java
Outdated
Show resolved
Hide resolved
lib/trino-parquet/src/main/java/io/trino/parquet/reader/ParquetColumnChunkIterator.java
Outdated
Show resolved
Hide resolved
lib/trino-parquet/src/main/java/io/trino/parquet/reader/ParquetColumnChunkIterator.java
Outdated
Show resolved
Hide resolved
lib/trino-parquet/src/main/java/io/trino/parquet/reader/ParquetColumnChunkIterator.java
Outdated
Show resolved
Hide resolved
plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/DeltaLakeSessionProperties.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/main/java/io/trino/plugin/hive/parquet/ParquetReaderConfig.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/main/java/io/trino/plugin/hive/parquet/ParquetReaderConfig.java
Outdated
Show resolved
Hide resolved
51b8cdb to
0e5b68e
Compare
sopel39
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm % comments
plugin/trino-hive/src/main/java/io/trino/plugin/hive/parquet/ParquetReaderConfig.java
Show resolved
Hide resolved
plugin/trino-hive/src/main/java/io/trino/plugin/hive/parquet/ParquetReaderConfig.java
Show resolved
Hide resolved
0e5b68e to
600cce9
Compare
raunaqmorarka
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor comments, lgtm otherwise
lib/trino-parquet/src/main/java/io/trino/parquet/ParquetReaderOptions.java
Show resolved
Hide resolved
plugin/trino-hive/src/main/java/io/trino/plugin/hive/parquet/ParquetReaderConfig.java
Outdated
Show resolved
Hide resolved
|
Please squash the commits, we don't need separate commit for adding docs entry |
Reading unusually large parquet pages can lead to workers going into full GC and crashing. This change adds a guard rail to fail reads of such files gracefully.
600cce9 to
e9095ab
Compare
Description
This PR implements page-level size limits for Parquet file reading to prevent
out-of-memory errors caused by extremely large pages.
Background:
process
Why page-level instead of column-level:
ParquetColumnChunkIterator,before page objects are created
Changes:
maxPageSizeInBytesparameter toParquetColumnChunkIteratorto validateuncompressed page size
ParquetCorruptionExceptionwhen a page exceeds the configured limitsession propertyconfig propertyparquet_max_page_sizeandparquet.max-page-read-size(default: 500MB) for:Added unit testTestPageReader.testPageSizeLimit()to verify the validationlogic
Additional context and related issues
legitimate large pages
but before creating page objects
Release notes
( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
(x) Release notes are required, with the following suggested text: